Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Jul 23, 2025

⚡️ This pull request contains optimizations for PR #572

If you approve this dependent PR, these changes will be merged into the original PR branch fix/is-repo-a-fork.

This PR will be automatically closed if the original PR is merged.


📄 14% (0.14x) speedup for is_repo_a_fork in codeflash/code_utils/env_utils.py

⏱️ Runtime : 66.0 microseconds 57.8 microseconds (best of 77 runs)

📝 Explanation and details

Here's how you can optimize the provided program for speed.

Optimizations

  1. Avoid pathlib when not necessary. open() is marginally faster and incurs less overhead than using Path(...).open().
  2. Directly use os.environ instead of os.getenv for a (very minor) speedup, falling back to {} if not set.
  3. Minimize nested .get calls by using exception handling since we expect predictable structure.

Here’s the faster version.

Notes:

  • The return value is exactly the same as before.
  • All function/method names and signatures are unchanged.
  • Your original comments are preserved as code is intact, except the note about open.
  • This version reduces function call overhead, speeds up file access, and speeds up the lookup for fork.

This is the fastest you can make it with the input/output requirements and standard library Python.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 47 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 85.7%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import json
import os
import tempfile
from functools import lru_cache
from pathlib import Path
from typing import Any

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.env_utils import is_repo_a_fork


def write_event_file(tmp_path, event_data):
    """Helper to write event data to a temporary file and return its path."""
    event_file = tmp_path / "event.json"
    event_file.write_text(json.dumps(event_data))
    return str(event_file)

# ------------------------------
# 1. Basic Test Cases
# ------------------------------

def test_fork_true(monkeypatch, tmp_path):
    # Basic: fork is True
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": True
                }
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.41μs -> 1.21μs (16.5% faster)

def test_fork_false(monkeypatch, tmp_path):
    # Basic: fork is False
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": False
                }
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.35μs -> 1.28μs (5.54% faster)

def test_fork_missing(monkeypatch, tmp_path):
    # Basic: fork key missing, should default to False
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    # 'fork' key is missing
                }
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.39μs -> 1.23μs (13.0% faster)

# ------------------------------
# 2. Edge Test Cases
# ------------------------------

def test_no_github_event_path_env(monkeypatch):
    # Edge: GITHUB_EVENT_PATH env not set
    monkeypatch.delenv("GITHUB_EVENT_PATH", raising=False)
    codeflash_output = is_repo_a_fork() # 1.16μs -> 972ns (19.5% faster)


def test_pull_request_key_missing(monkeypatch, tmp_path):
    # Edge: 'pull_request' key is missing
    event = {
        "something_else": {}
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.49μs -> 1.33μs (12.0% faster)

def test_head_key_missing(monkeypatch, tmp_path):
    # Edge: 'head' key is missing
    event = {
        "pull_request": {
            # 'head' missing
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.43μs -> 1.31μs (9.06% faster)

def test_repo_key_missing(monkeypatch, tmp_path):
    # Edge: 'repo' key is missing
    event = {
        "pull_request": {
            "head": {
                # 'repo' missing
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.38μs -> 1.29μs (7.04% faster)

def test_fork_null(monkeypatch, tmp_path):
    # Edge: fork is None
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": None
                }
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.36μs -> 1.27μs (7.15% faster)

def test_fork_string_true(monkeypatch, tmp_path):
    # Edge: fork is string "true" (should be truthy)
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": "true"
                }
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    # bool("true") is True
    codeflash_output = is_repo_a_fork() # 1.35μs -> 1.27μs (6.37% faster)

def test_fork_string_false(monkeypatch, tmp_path):
    # Edge: fork is string "false" (should be truthy, since non-empty string)
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": "false"
                }
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.45μs -> 1.26μs (15.1% faster)

def test_fork_empty_string(monkeypatch, tmp_path):
    # Edge: fork is an empty string (should be False)
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": ""
                }
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.43μs -> 1.25μs (14.5% faster)

def test_fork_zero(monkeypatch, tmp_path):
    # Edge: fork is 0 (should be False)
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": 0
                }
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.45μs -> 1.24μs (16.9% faster)

def test_fork_one(monkeypatch, tmp_path):
    # Edge: fork is 1 (should be True)
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": 1
                }
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.48μs -> 1.29μs (14.7% faster)

def test_fork_list(monkeypatch, tmp_path):
    # Edge: fork is a non-empty list (should be True)
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": [1]
                }
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.48μs -> 1.24μs (19.3% faster)

def test_fork_empty_list(monkeypatch, tmp_path):
    # Edge: fork is an empty list (should be False)
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": []
                }
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.40μs -> 1.26μs (11.0% faster)

def test_fork_dict(monkeypatch, tmp_path):
    # Edge: fork is a non-empty dict (should be True)
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": {"foo": "bar"}
                }
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.39μs -> 1.22μs (13.9% faster)

def test_fork_empty_dict(monkeypatch, tmp_path):
    # Edge: fork is an empty dict (should be False)
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": {}
                }
            }
        }
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.43μs -> 1.22μs (17.2% faster)

# ------------------------------
# 3. Large Scale Test Cases
# ------------------------------

def test_large_event_data_fork_true(monkeypatch, tmp_path):
    # Large: lots of unrelated data, but fork is True
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": True,
                    "other_data": [i for i in range(500)]
                }
            }
        },
        "unrelated": {str(i): i for i in range(500)}
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.41μs -> 1.30μs (8.53% faster)

def test_large_event_data_fork_false(monkeypatch, tmp_path):
    # Large: lots of unrelated data, but fork is False
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": False,
                    "other_data": [i for i in range(500)]
                }
            }
        },
        "unrelated": {str(i): i for i in range(500)}
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.43μs -> 1.28μs (11.7% faster)

def test_large_nested_structure_fork_true(monkeypatch, tmp_path):
    # Large: deeply nested structure, fork is True
    nested = {}
    curr = nested
    for i in range(100):
        curr["level"] = {}
        curr = curr["level"]
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": True
                }
            }
        },
        "nested": nested
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.45μs -> 1.28μs (13.3% faster)

def test_many_pull_requests(monkeypatch, tmp_path):
    # Large: many pull_request objects, only one is used
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": True
                }
            }
        }
    }
    # Add lots of unrelated pull_request_N keys
    for i in range(900):
        event[f"pull_request_{i}"] = {"head": {"repo": {"fork": False}}}
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.55μs -> 1.34μs (15.7% faster)

def test_large_file_with_missing_fork(monkeypatch, tmp_path):
    # Large: large file, but 'fork' missing
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    # fork missing
                    "other": [i for i in range(900)]
                }
            }
        },
        "big": [i for i in range(900)]
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.38μs -> 1.30μs (6.22% faster)

def test_large_file_with_fork_false(monkeypatch, tmp_path):
    # Large: large file, fork is False
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": False,
                    "other": [i for i in range(900)]
                }
            }
        },
        "big": [i for i in range(900)]
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.43μs -> 1.25μs (14.5% faster)

def test_large_file_with_fork_true(monkeypatch, tmp_path):
    # Large: large file, fork is True
    event = {
        "pull_request": {
            "head": {
                "repo": {
                    "fork": True,
                    "other": [i for i in range(900)]
                }
            }
        },
        "big": [i for i in range(900)]
    }
    event_path = write_event_file(tmp_path, event)
    monkeypatch.setenv("GITHUB_EVENT_PATH", event_path)
    codeflash_output = is_repo_a_fork() # 1.42μs -> 1.27μs (11.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from __future__ import annotations

import json
import os
import tempfile
from functools import lru_cache
from pathlib import Path
from typing import Any

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.env_utils import is_repo_a_fork


def write_event_file(event_data: dict) -> str:
    # Helper to write event data to a temp file and set env var.
    tmp = tempfile.NamedTemporaryFile("w+", delete=False)
    json.dump(event_data, tmp)
    tmp.flush()
    os.environ["GITHUB_EVENT_PATH"] = tmp.name
    tmp.close()
    return tmp.name

def remove_event_file(filename: str):
    # Helper to remove temp event file.
    try:
        os.remove(filename)
    except Exception:
        pass

def unset_event_path():
    # Helper to unset the env variable
    if "GITHUB_EVENT_PATH" in os.environ:
        del os.environ["GITHUB_EVENT_PATH"]

# --- BASIC TEST CASES ---

def test_fork_true():
    # Basic: PR from a forked repo
    fname = write_event_file({
        "pull_request": {
            "head": {
                "repo": {
                    "fork": True
                }
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_fork_false():
    # Basic: PR from a non-fork repo
    fname = write_event_file({
        "pull_request": {
            "head": {
                "repo": {
                    "fork": False
                }
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_fork_missing_key():
    # Basic: PR where 'fork' key is missing (should default to False)
    fname = write_event_file({
        "pull_request": {
            "head": {
                "repo": {
                    # no 'fork' key
                }
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_fork_key_is_none():
    # Basic: 'fork' key present but value is None (should treat as False)
    fname = write_event_file({
        "pull_request": {
            "head": {
                "repo": {
                    "fork": None
                }
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

# --- EDGE TEST CASES ---

def test_no_github_event_path_env():
    # Edge: GITHUB_EVENT_PATH is not set at all
    unset_event_path()
    codeflash_output = is_repo_a_fork() # 1.14μs -> 941ns (21.4% faster)


def test_pull_request_key_missing():
    # Edge: No 'pull_request' key in event data
    fname = write_event_file({
        "something_else": {}
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_head_key_missing():
    # Edge: No 'head' key under 'pull_request'
    fname = write_event_file({
        "pull_request": {
            "base": {}
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_repo_key_missing():
    # Edge: No 'repo' key under 'head'
    fname = write_event_file({
        "pull_request": {
            "head": {
                "ref": "main"
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_fork_key_is_string_true():
    # Edge: 'fork' key is string "true" (should be truthy)
    fname = write_event_file({
        "pull_request": {
            "head": {
                "repo": {
                    "fork": "true"
                }
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_fork_key_is_string_false():
    # Edge: 'fork' key is string "false" (should be truthy, non-empty string)
    fname = write_event_file({
        "pull_request": {
            "head": {
                "repo": {
                    "fork": "false"
                }
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()  # non-empty string is truthy
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_fork_key_is_empty_string():
    # Edge: 'fork' key is empty string (should be falsy)
    fname = write_event_file({
        "pull_request": {
            "head": {
                "repo": {
                    "fork": ""
                }
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_fork_key_is_integer_zero():
    # Edge: 'fork' key is 0 (should be falsy)
    fname = write_event_file({
        "pull_request": {
            "head": {
                "repo": {
                    "fork": 0
                }
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_fork_key_is_integer_one():
    # Edge: 'fork' key is 1 (should be truthy)
    fname = write_event_file({
        "pull_request": {
            "head": {
                "repo": {
                    "fork": 1
                }
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_fork_key_is_list():
    # Edge: 'fork' key is a non-empty list (should be truthy)
    fname = write_event_file({
        "pull_request": {
            "head": {
                "repo": {
                    "fork": [1, 2, 3]
                }
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_fork_key_is_empty_list():
    # Edge: 'fork' key is an empty list (should be falsy)
    fname = write_event_file({
        "pull_request": {
            "head": {
                "repo": {
                    "fork": []
                }
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_fork_key_is_dict():
    # Edge: 'fork' key is a non-empty dict (should be truthy)
    fname = write_event_file({
        "pull_request": {
            "head": {
                "repo": {
                    "fork": {"foo": "bar"}
                }
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_fork_key_is_empty_dict():
    # Edge: 'fork' key is an empty dict (should be falsy)
    fname = write_event_file({
        "pull_request": {
            "head": {
                "repo": {
                    "fork": {}
                }
            }
        }
    })
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

# --- LARGE SCALE TEST CASES ---

def test_large_event_file_many_keys_fork_true():
    # Large: Event file with many unrelated keys, fork True
    event = {"pull_request": {"head": {"repo": {"fork": True}}}}
    # Add 900 unrelated keys
    for i in range(900):
        event[f"random_key_{i}"] = i
    fname = write_event_file(event)
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_large_event_file_many_keys_fork_false():
    # Large: Event file with many unrelated keys, fork False
    event = {"pull_request": {"head": {"repo": {"fork": False}}}}
    for i in range(900):
        event[f"random_key_{i}"] = [i] * 3
    fname = write_event_file(event)
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_large_nested_structure_fork_true():
    # Large: Deeply nested structure with fork True
    # We'll nest 'head' and 'repo' under many levels
    deep = {"fork": True}
    for _ in range(20):
        deep = {"repo": deep}
        deep = {"head": deep}
        deep = {"pull_request": deep}
    # Now flatten out to the expected structure at the top
    event = deep
    fname = write_event_file(event)
    try:
        # Should still find the fork = True at the expected path
        # But our function only looks at event["pull_request"]["head"]["repo"]["fork"]
        # So this will fail unless the structure matches exactly
        # Therefore, we need to build the correct structure at the top level
        # Let's test with a large number of unrelated nested keys but correct path
        event = {"pull_request": {"head": {"repo": {"fork": True}}}}
        for i in range(500):
            event[f"junk_{i}"] = {"data": i}
        fname = write_event_file(event)
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()

def test_large_array_in_repo_object():
    # Large: 'repo' object contains a large array, fork is False
    repo_obj = {"fork": False, "contributors": list(range(999))}
    event = {"pull_request": {"head": {"repo": repo_obj}}}
    fname = write_event_file(event)
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()


def test_large_number_of_irrelevant_keys():
    # Large: Event file with 999 irrelevant top-level keys, correct fork path
    event = {"pull_request": {"head": {"repo": {"fork": True}}}}
    for i in range(999):
        event[f"irrelevant_{i}"] = "x" * 10
    fname = write_event_file(event)
    try:
        codeflash_output = is_repo_a_fork()
    finally:
        remove_event_file(fname)
        unset_event_path()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr572-2025-07-23T11.56.20 and push.

Codeflash

…-a-fork`)

Here's how you can optimize the provided program for speed.

### Optimizations

1. **Avoid pathlib when not necessary.** `open()` is marginally faster and incurs less overhead than using `Path(...).open()`.
2. **Directly use `os.environ` instead of `os.getenv` for a (very minor) speedup, falling back to `{}` if not set.**
3. **Minimize nested `.get` calls by using exception handling since we expect predictable structure.**

Here’s the faster version.



**Notes:**
- The return value is exactly the same as before.
- All function/method names and signatures are unchanged.
- Your original comments are preserved as code is intact, except the note about `open`.
- This version reduces function call overhead, speeds up file access, and speeds up the lookup for `fork`.

This is the fastest you can make it with the input/output requirements and standard library Python.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 23, 2025
@codeflash-ai codeflash-ai bot closed this Jul 23, 2025
@codeflash-ai
Copy link
Contributor Author

codeflash-ai bot commented Jul 23, 2025

This PR has been automatically closed because the original PR #572 by mohammedahmed18 was closed.

@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr572-2025-07-23T11.56.20 branch July 23, 2025 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants